153 research outputs found

    Faster Random Walks By Rewiring Online Social Networks On-The-Fly

    Full text link
    Many online social networks feature restrictive web interfaces which only allow the query of a user's local neighborhood through the interface. To enable analytics over such an online social network through its restrictive web interface, many recent efforts reuse the existing Markov Chain Monte Carlo methods such as random walks to sample the social network and support analytics based on the samples. The problem with such an approach, however, is the large amount of queries often required (i.e., a long "mixing time") for a random walk to reach a desired (stationary) sampling distribution. In this paper, we consider a novel problem of enabling a faster random walk over online social networks by "rewiring" the social network on-the-fly. Specifically, we develop Modified TOpology (MTO)-Sampler which, by using only information exposed by the restrictive web interface, constructs a "virtual" overlay topology of the social network while performing a random walk, and ensures that the random walk follows the modified overlay topology rather than the original one. We show that MTO-Sampler not only provably enhances the efficiency of sampling, but also achieves significant savings on query cost over real-world online social networks such as Google Plus, Epinion etc.Comment: 15 pages, 14 figure, technical report for ICDE2013 paper. Appendix has all the theorems' proofs; ICDE'201

    GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification

    Full text link
    Graph neural networks (GNNs) have achieved great success in node classification tasks. However, existing GNNs naturally bias towards the majority classes with more labelled data and ignore those minority classes with relatively few labelled ones. The traditional techniques often resort over-sampling methods, but they may cause overfitting problem. More recently, some works propose to synthesize additional nodes for minority classes from the labelled nodes, however, there is no any guarantee if those generated nodes really stand for the corresponding minority classes. In fact, improperly synthesized nodes may result in insufficient generalization of the algorithm. To resolve the problem, in this paper we seek to automatically augment the minority classes from the massive unlabelled nodes of the graph. Specifically, we propose \textit{GraphSR}, a novel self-training strategy to augment the minority classes with significant diversity of unlabelled nodes, which is based on a Similarity-based selection module and a Reinforcement Learning(RL) selection module. The first module finds a subset of unlabelled nodes which are most similar to those labelled minority nodes, and the second one further determines the representative and reliable nodes from the subset via RL technique. Furthermore, the RL-based module can adaptively determine the sampling scale according to current training data. This strategy is general and can be easily combined with different GNNs models. Our experiments demonstrate the proposed approach outperforms the state-of-the-art baselines on various class-imbalanced datasets.Comment: Accepted by AAAI202

    Guest Editorial: Web-based services and information systems

    Full text link

    Vehicle trajectory clustering based on dynamic representation learning of internet of vehicles

    Get PDF
    With the widely used Internet of Things, 5G, and smart city technologies, we are able to acquire a variety of vehicle trajectory data. These trajectory data are of great significance which can be used to extract relevant information in order to, for instance, calculate the optimal path from one position to another, detect abnormal behavior, monitor the traffic flow in a city, and predict the next position of an object. One of the key technology is to cluster vehicle trajectory. However, existing methods mainly rely on manually designed metrics which may lead to biased results. Meanwhile, the large scale of vehicle trajectory data has become a challenge because calculating these manually designed metrics will cost more time and space. To address these challenges, we propose to employ network representation learning to achieve accurate vehicle trajectory clustering. Specifically, we first construct the k-nearest neighbor-based internet of vehicles in a dynamic manner. Then we learn the low-dimensional representations of vehicles by performing dynamic network representation learning on the constructed network. Finally, using the learned vehicle vectors, vehicle trajectories are clustered with machine learning methods. Experimental results on the real-word dataset show that our method achieves the best performance compared against baseline methods. © 2000-2011 IEEE. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Feng Xia” is provided in this record*

    Not Every Couple Is a Pair: A Supervised Approach for Lifetime Collaborator Identification

    Get PDF
    While scientific collaboration can be critical for a scholar, some collaborator(s) can be more significant than others, a.k.a. lifetime collaborator(s). This work-in-progress aims to investigate whether it is possible to predict/identify lifetime collaborators given a junior scholar\u27s early profile. For this purpose, we propose a supervised approach by leveraging scholars\u27 local and network properties. Extensive experiments on DBLP digital library demonstrate that lifetime collaborators can be accurately predicted. The proposed model outperforms baseline models with various predictors. Our study may shed light on the exploration of scientific collaborations from the perspective of life-long collaboration

    AIDA: Legal Judgment Predictions for Non-Professional Fact Descriptions via Partial-and-Imbalanced Domain Adaptation

    Full text link
    In this paper, we study the problem of legal domain adaptation problem from an imbalanced source domain to a partial target domain. The task aims to improve legal judgment predictions for non-professional fact descriptions. We formulate this task as a partial-and-imbalanced domain adaptation problem. Though deep domain adaptation has achieved cutting-edge performance in many unsupervised domain adaptation tasks. However, due to the negative transfer of samples in non-shared classes, it is hard for current domain adaptation model to solve the partial-and-imbalanced transfer problem. In this work, we explore large-scale non-shared but related classes data in the source domain with a hierarchy weighting adaptation to tackle this limitation. We propose to embed a novel pArtial Imbalanced Domain Adaptation technique (AIDA) in the deep learning model, which can jointly borrow sibling knowledge from non-shared classes to shared classes in the source domain and further transfer the shared classes knowledge from the source domain to the target domain. Experimental results show that our model outperforms the state-of-the-art algorithms.Comment: 13 pages, 15 figure
    corecore